Overview

Dataset statistics

Number of variables29
Number of observations128367
Missing cells1123851
Missing cells (%)30.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory122.5 MiB
Average record size in memory1000.4 B

Variable types

Categorical22
Numeric7

Warnings

CRASH DATE has a high cardinality: 430 distinct values High cardinality
CRASH TIME has a high cardinality: 1440 distinct values High cardinality
LOCATION has a high cardinality: 52867 distinct values High cardinality
ON STREET NAME has a high cardinality: 4776 distinct values High cardinality
CROSS STREET NAME has a high cardinality: 5270 distinct values High cardinality
OFF STREET NAME has a high cardinality: 28958 distinct values High cardinality
CONTRIBUTING FACTOR VEHICLE 1 has a high cardinality: 55 distinct values High cardinality
VEHICLE TYPE CODE 1 has a high cardinality: 414 distinct values High cardinality
VEHICLE TYPE CODE 2 has a high cardinality: 427 distinct values High cardinality
VEHICLE TYPE CODE 3 has a high cardinality: 77 distinct values High cardinality
LATITUDE is highly correlated with LONGITUDEHigh correlation
LONGITUDE is highly correlated with LATITUDEHigh correlation
NUMBER OF PERSONS INJURED is highly correlated with NUMBER OF MOTORIST INJUREDHigh correlation
NUMBER OF MOTORIST INJURED is highly correlated with NUMBER OF PERSONS INJUREDHigh correlation
NUMBER OF MOTORIST KILLED is highly correlated with NUMBER OF PERSONS KILLEDHigh correlation
NUMBER OF PERSONS KILLED is highly correlated with NUMBER OF MOTORIST KILLEDHigh correlation
BOROUGH has 44591 (34.7%) missing values Missing
ZIP CODE has 44600 (34.7%) missing values Missing
LATITUDE has 10096 (7.9%) missing values Missing
LONGITUDE has 10096 (7.9%) missing values Missing
LOCATION has 10096 (7.9%) missing values Missing
ON STREET NAME has 33769 (26.3%) missing values Missing
CROSS STREET NAME has 68115 (53.1%) missing values Missing
OFF STREET NAME has 94598 (73.7%) missing values Missing
CONTRIBUTING FACTOR VEHICLE 2 has 28507 (22.2%) missing values Missing
CONTRIBUTING FACTOR VEHICLE 3 has 116070 (90.4%) missing values Missing
CONTRIBUTING FACTOR VEHICLE 4 has 125009 (97.4%) missing values Missing
CONTRIBUTING FACTOR VEHICLE 5 has 127377 (99.2%) missing values Missing
VEHICLE TYPE CODE 2 has 39742 (31.0%) missing values Missing
VEHICLE TYPE CODE 3 has 116772 (91.0%) missing values Missing
VEHICLE TYPE CODE 4 has 125156 (97.5%) missing values Missing
VEHICLE TYPE CODE 5 has 127407 (99.3%) missing values Missing
LATITUDE is highly skewed (γ1 = -26.76189702) Skewed
LONGITUDE is highly skewed (γ1 = 26.84483109) Skewed
OFF STREET NAME is uniformly distributed Uniform
COLLISION_ID has unique values Unique
NUMBER OF PERSONS INJURED has 90759 (70.7%) zeros Zeros
NUMBER OF PEDESTRIANS INJURED has 120992 (94.3%) zeros Zeros
NUMBER OF MOTORIST INJURED has 103780 (80.8%) zeros Zeros

Reproduction

Analysis started2021-03-16 03:34:57.922404
Analysis finished2021-03-16 03:35:53.489375
Duration55.57 seconds
Software versionpandas-profiling v2.11.0
Download configurationconfig.yaml

Variables

CRASH DATE
Categorical

HIGH CARDINALITY

Distinct430
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size8.2 MiB
01/18/2020
 
774
03/06/2020
 
673
02/14/2020
 
632
02/07/2020
 
604
02/27/2020
 
581
Other values (425)
125103 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters1283670
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row01/02/2020
2nd row01/02/2020
3rd row01/02/2020
4th row01/02/2020
5th row01/02/2020
ValueCountFrequency (%)
01/18/2020774
 
0.6%
03/06/2020673
 
0.5%
02/14/2020632
 
0.5%
02/07/2020604
 
0.5%
02/27/2020581
 
0.5%
02/10/2020572
 
0.4%
01/17/2020566
 
0.4%
03/02/2020562
 
0.4%
02/03/2020559
 
0.4%
01/21/2020556
 
0.4%
Other values (420)122288
95.3%
2021-03-15T23:35:53.894367image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
01/18/2020774
 
0.6%
03/06/2020673
 
0.5%
02/14/2020632
 
0.5%
02/07/2020604
 
0.5%
02/27/2020581
 
0.5%
02/10/2020572
 
0.4%
01/17/2020566
 
0.4%
03/02/2020562
 
0.4%
02/03/2020559
 
0.4%
01/21/2020556
 
0.4%
Other values (420)122288
95.3%

Most occurring characters

ValueCountFrequency (%)
0403631
31.4%
2338480
26.4%
/256734
20.0%
1130923
 
10.2%
331595
 
2.5%
822285
 
1.7%
721701
 
1.7%
921656
 
1.7%
620649
 
1.6%
518632
 
1.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1026936
80.0%
Other Punctuation256734
 
20.0%

Most frequent character per category

ValueCountFrequency (%)
0403631
39.3%
2338480
33.0%
1130923
 
12.7%
331595
 
3.1%
822285
 
2.2%
721701
 
2.1%
921656
 
2.1%
620649
 
2.0%
518632
 
1.8%
417384
 
1.7%
ValueCountFrequency (%)
/256734
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common1283670
100.0%

Most frequent character per script

ValueCountFrequency (%)
0403631
31.4%
2338480
26.4%
/256734
20.0%
1130923
 
10.2%
331595
 
2.5%
822285
 
1.7%
721701
 
1.7%
921656
 
1.7%
620649
 
1.6%
518632
 
1.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII1283670
100.0%

Most frequent character per block

ValueCountFrequency (%)
0403631
31.4%
2338480
26.4%
/256734
20.0%
1130923
 
10.2%
331595
 
2.5%
822285
 
1.7%
721701
 
1.7%
921656
 
1.7%
620649
 
1.6%
518632
 
1.5%

CRASH TIME
Categorical

HIGH CARDINALITY

Distinct1440
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Memory size7.6 MiB
0:00
 
2110
15:00
 
1572
16:00
 
1521
14:00
 
1453
17:00
 
1445
Other values (1435)
120266 

Length

Max length5
Median length5
Mean length4.732259849
Min length4

Characters and Unicode

Total characters607466
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0:00
2nd row12:57
3rd row15:00
4th row15:10
5th row17:30
ValueCountFrequency (%)
0:002110
 
1.6%
15:001572
 
1.2%
16:001521
 
1.2%
14:001453
 
1.1%
17:001445
 
1.1%
13:001418
 
1.1%
18:001360
 
1.1%
12:001359
 
1.1%
10:001251
 
1.0%
9:001182
 
0.9%
Other values (1430)113696
88.6%
2021-03-15T23:35:54.291420image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
0:002110
 
1.6%
15:001572
 
1.2%
16:001521
 
1.2%
14:001453
 
1.1%
17:001445
 
1.1%
13:001418
 
1.1%
18:001360
 
1.1%
12:001359
 
1.1%
10:001251
 
1.0%
9:001182
 
0.9%
Other values (1430)113696
88.6%

Most occurring characters

ValueCountFrequency (%)
:128367
21.1%
0114253
18.8%
1111181
18.3%
253789
8.9%
553150
8.7%
342929
 
7.1%
434124
 
5.6%
819275
 
3.2%
717325
 
2.9%
916636
 
2.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number479099
78.9%
Other Punctuation128367
 
21.1%

Most frequent character per category

ValueCountFrequency (%)
0114253
23.8%
1111181
23.2%
253789
11.2%
553150
11.1%
342929
 
9.0%
434124
 
7.1%
819275
 
4.0%
717325
 
3.6%
916636
 
3.5%
616437
 
3.4%
ValueCountFrequency (%)
:128367
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common607466
100.0%

Most frequent character per script

ValueCountFrequency (%)
:128367
21.1%
0114253
18.8%
1111181
18.3%
253789
8.9%
553150
8.7%
342929
 
7.1%
434124
 
5.6%
819275
 
3.2%
717325
 
2.9%
916636
 
2.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII607466
100.0%

Most frequent character per block

ValueCountFrequency (%)
:128367
21.1%
0114253
18.8%
1111181
18.3%
253789
8.9%
553150
8.7%
342929
 
7.1%
434124
 
5.6%
819275
 
3.2%
717325
 
2.9%
916636
 
2.7%

BOROUGH
Categorical

MISSING

Distinct5
Distinct (%)< 0.1%
Missing44591
Missing (%)34.7%
Memory size6.5 MiB
BROOKLYN
29091 
QUEENS
23503 
BRONX
16186 
MANHATTAN
12198 
STATEN ISLAND
 
2798

Length

Max length13
Median length8
Mean length7.171886937
Min length5

Characters and Unicode

Total characters600832
Distinct characters19
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBROOKLYN
2nd rowBRONX
3rd rowMANHATTAN
4th rowBROOKLYN
5th rowBROOKLYN
ValueCountFrequency (%)
BROOKLYN29091
22.7%
QUEENS23503
18.3%
BRONX16186
 
12.6%
MANHATTAN12198
 
9.5%
STATEN ISLAND2798
 
2.2%
(Missing)44591
34.7%
2021-03-15T23:35:54.659420image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-03-15T23:35:54.814420image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
brooklyn29091
33.6%
queens23503
27.1%
bronx16186
18.7%
manhattan12198
14.1%
island2798
 
3.2%
staten2798
 
3.2%

Most occurring characters

ValueCountFrequency (%)
N98772
16.4%
O74368
12.4%
E49804
 
8.3%
B45277
 
7.5%
R45277
 
7.5%
A42190
 
7.0%
L31889
 
5.3%
T29992
 
5.0%
S29099
 
4.8%
K29091
 
4.8%
Other values (9)125073
20.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter598034
99.5%
Space Separator2798
 
0.5%

Most frequent character per category

ValueCountFrequency (%)
N98772
16.5%
O74368
12.4%
E49804
8.3%
B45277
 
7.6%
R45277
 
7.6%
A42190
 
7.1%
L31889
 
5.3%
T29992
 
5.0%
S29099
 
4.9%
K29091
 
4.9%
Other values (8)122275
20.4%
ValueCountFrequency (%)
2798
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin598034
99.5%
Common2798
 
0.5%

Most frequent character per script

ValueCountFrequency (%)
N98772
16.5%
O74368
12.4%
E49804
8.3%
B45277
 
7.6%
R45277
 
7.6%
A42190
 
7.1%
L31889
 
5.3%
T29992
 
5.0%
S29099
 
4.9%
K29091
 
4.9%
Other values (8)122275
20.4%
ValueCountFrequency (%)
2798
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII600832
100.0%

Most frequent character per block

ValueCountFrequency (%)
N98772
16.4%
O74368
12.4%
E49804
 
8.3%
B45277
 
7.5%
R45277
 
7.5%
A42190
 
7.0%
L31889
 
5.3%
T29992
 
5.0%
S29099
 
4.8%
K29091
 
4.8%
Other values (9)125073
20.8%

ZIP CODE
Real number (ℝ≥0)

MISSING

Distinct204
Distinct (%)0.2%
Missing44600
Missing (%)34.7%
Infinite0
Infinite (%)0.0%
Mean10912.96565
Minimum10000
Maximum11697
Zeros0
Zeros (%)0.0%
Memory size1003.0 KiB
2021-03-15T23:35:55.024421image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum10000
5-th percentile10016
Q110458
median11210
Q311354
95-th percentile11432
Maximum11697
Range1697
Interquartile range (IQR)896

Descriptive statistics

Standard deviation513.1528987
Coefficient of variation (CV)0.04702231409
Kurtosis-1.21036805
Mean10912.96565
Median Absolute Deviation (MAD)202
Skewness-0.6206983569
Sum914146394
Variance263325.8975
MonotocityNot monotonic
2021-03-15T23:35:55.240370image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
112072038
 
1.6%
112361596
 
1.2%
112121518
 
1.2%
112081415
 
1.1%
112031355
 
1.1%
113851317
 
1.0%
112341238
 
1.0%
114341201
 
0.9%
112261171
 
0.9%
112331097
 
0.9%
Other values (194)69821
54.4%
(Missing)44600
34.7%
ValueCountFrequency (%)
1000019
 
< 0.1%
10001488
0.4%
10002699
0.5%
10003361
0.3%
1000465
 
0.1%
1000544
 
< 0.1%
1000652
 
< 0.1%
10007150
 
0.1%
10009271
 
0.2%
10010302
0.2%
ValueCountFrequency (%)
1169711
 
< 0.1%
116951
 
< 0.1%
11694132
 
0.1%
11693124
 
0.1%
11692143
 
0.1%
11691613
0.5%
11436256
 
0.2%
11435669
0.5%
114341201
0.9%
11433512
0.4%

LATITUDE
Real number (ℝ≥0)

HIGH CORRELATION
MISSING
SKEWED

Distinct38213
Distinct (%)32.3%
Missing10096
Missing (%)7.9%
Infinite0
Infinite (%)0.0%
Mean40.67035821
Minimum0
Maximum40.912884
Zeros163
Zeros (%)0.1%
Memory size1003.0 KiB
2021-03-15T23:35:55.498369image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile40.59918
Q140.66645
median40.71532
Q340.791084
95-th percentile40.86544
Maximum40.912884
Range40.912884
Interquartile range (IQR)0.124634

Descriptive statistics

Standard deviation1.513132419
Coefficient of variation (CV)0.03720479695
Kurtosis716.3483792
Mean40.67035821
Median Absolute Deviation (MAD)0.055432
Skewness-26.76189702
Sum4810123.935
Variance2.289569717
MonotocityNot monotonic
2021-03-15T23:35:55.697421image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0163
 
0.1%
40.861862130
 
0.1%
40.82030570
 
0.1%
40.67573565
 
0.1%
40.65186364
 
< 0.1%
40.6596563
 
< 0.1%
40.6916862
 
< 0.1%
40.69603360
 
< 0.1%
40.804760
 
< 0.1%
40.8380158
 
< 0.1%
Other values (38203)117476
91.5%
(Missing)10096
 
7.9%
ValueCountFrequency (%)
0163
0.1%
40.5041161
 
< 0.1%
40.504471
 
< 0.1%
40.5044821
 
< 0.1%
40.504651
 
< 0.1%
40.5050621
 
< 0.1%
40.505261
 
< 0.1%
40.5061872
 
< 0.1%
40.506671
 
< 0.1%
40.5067562
 
< 0.1%
ValueCountFrequency (%)
40.9128841
< 0.1%
40.9124681
< 0.1%
40.912221
< 0.1%
40.912171
< 0.1%
40.9120181
< 0.1%
40.9116671
< 0.1%
40.9110681
< 0.1%
40.91091
< 0.1%
40.910761
< 0.1%
40.910381
< 0.1%

LONGITUDE
Real number (ℝ)

HIGH CORRELATION
MISSING
SKEWED

Distinct29126
Distinct (%)24.6%
Missing10096
Missing (%)7.9%
Infinite0
Infinite (%)0.0%
Mean-73.81002768
Minimum-74.253006
Maximum0
Zeros163
Zeros (%)0.1%
Memory size1003.0 KiB
2021-03-15T23:35:55.921368image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum-74.253006
5-th percentile-74.018845
Q1-73.95836
median-73.91696
Q3-73.86384
95-th percentile-73.76084
Maximum0
Range74.253006
Interquartile range (IQR)0.09452

Descriptive statistics

Standard deviation2.743264834
Coefficient of variation (CV)-0.03716656015
Kurtosis719.3131649
Mean-73.81002768
Median Absolute Deviation (MAD)0.046434
Skewness26.84483109
Sum-8729585.784
Variance7.525501947
MonotocityNot monotonic
2021-03-15T23:35:56.127477image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0163
 
0.1%
-73.91282134
 
0.1%
-73.8908380
 
0.1%
-73.8906376
 
0.1%
-73.8653673
 
0.1%
-73.8968672
 
0.1%
-73.9124370
 
0.1%
-73.9845366
 
0.1%
-73.77383462
 
< 0.1%
-73.9375561
 
< 0.1%
Other values (29116)117414
91.5%
(Missing)10096
 
7.9%
ValueCountFrequency (%)
-74.2530061
 
< 0.1%
-74.250761
 
< 0.1%
-74.250471
 
< 0.1%
-74.250151
 
< 0.1%
-74.249762
< 0.1%
-74.249491
 
< 0.1%
-74.249411
 
< 0.1%
-74.2488861
 
< 0.1%
-74.248573
< 0.1%
-74.248281
 
< 0.1%
ValueCountFrequency (%)
0163
0.1%
-73.7005841
 
< 0.1%
-73.700992
 
< 0.1%
-73.701291
 
< 0.1%
-73.70151
 
< 0.1%
-73.701743
 
< 0.1%
-73.701771
 
< 0.1%
-73.701911
 
< 0.1%
-73.701923
 
< 0.1%
-73.701941
 
< 0.1%

LOCATION
Categorical

HIGH CARDINALITY
MISSING

Distinct52867
Distinct (%)44.7%
Missing10096
Missing (%)7.9%
Memory size9.2 MiB
(0.0, 0.0)
 
163
(40.861862, -73.91282)
 
129
(40.820305, -73.89083)
 
70
(40.675735, -73.89686)
 
65
(40.65965, -73.773834)
 
62
Other values (52862)
117782 

Length

Max length23
Median length22
Mean length21.71183976
Min length10

Characters and Unicode

Total characters2567881
Distinct characters16
Distinct categories6 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique33635 ?
Unique (%)28.4%

Sample

1st row(40.668266, -73.84214)
2nd row(40.700527, -73.94161)
3rd row(40.843033, -73.881805)
4th row(40.75974, -73.97423)
5th row(40.74955, -74.00654)
ValueCountFrequency (%)
(0.0, 0.0)163
 
0.1%
(40.861862, -73.91282)129
 
0.1%
(40.820305, -73.89083)70
 
0.1%
(40.675735, -73.89686)65
 
0.1%
(40.65965, -73.773834)62
 
< 0.1%
(40.696033, -73.98453)60
 
< 0.1%
(40.8047, -73.91243)60
 
< 0.1%
(40.651863, -73.86536)59
 
< 0.1%
(40.83801, -73.87329)56
 
< 0.1%
(40.668495, -73.925606)56
 
< 0.1%
Other values (52857)117491
91.5%
(Missing)10096
 
7.9%
2021-03-15T23:35:56.754369image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
0.0326
 
0.1%
73.91282134
 
0.1%
40.861862130
 
0.1%
73.8908380
 
< 0.1%
73.8906376
 
< 0.1%
73.8653673
 
< 0.1%
73.8968672
 
< 0.1%
40.82030570
 
< 0.1%
73.9124370
 
< 0.1%
73.9845366
 
< 0.1%
Other values (67328)235445
99.5%

Most occurring characters

ValueCountFrequency (%)
7275948
10.7%
4244229
 
9.5%
.236542
 
9.2%
3215022
 
8.4%
0204443
 
8.0%
8163070
 
6.4%
6161196
 
6.3%
9155051
 
6.0%
5124712
 
4.9%
(118271
 
4.6%
Other values (6)669397
26.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1740147
67.8%
Other Punctuation354813
 
13.8%
Open Punctuation118271
 
4.6%
Space Separator118271
 
4.6%
Close Punctuation118271
 
4.6%
Dash Punctuation118108
 
4.6%

Most frequent character per category

ValueCountFrequency (%)
7275948
15.9%
4244229
14.0%
3215022
12.4%
0204443
11.7%
8163070
9.4%
6161196
9.3%
9155051
8.9%
5124712
7.2%
299859
 
5.7%
196617
 
5.6%
ValueCountFrequency (%)
.236542
66.7%
,118271
33.3%
ValueCountFrequency (%)
(118271
100.0%
ValueCountFrequency (%)
118271
100.0%
ValueCountFrequency (%)
-118108
100.0%
ValueCountFrequency (%)
)118271
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2567881
100.0%

Most frequent character per script

ValueCountFrequency (%)
7275948
10.7%
4244229
 
9.5%
.236542
 
9.2%
3215022
 
8.4%
0204443
 
8.0%
8163070
 
6.4%
6161196
 
6.3%
9155051
 
6.0%
5124712
 
4.9%
(118271
 
4.6%
Other values (6)669397
26.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII2567881
100.0%

Most frequent character per block

ValueCountFrequency (%)
7275948
10.7%
4244229
 
9.5%
.236542
 
9.2%
3215022
 
8.4%
0204443
 
8.0%
8163070
 
6.4%
6161196
 
6.3%
9155051
 
6.0%
5124712
 
4.9%
(118271
 
4.6%
Other values (6)669397
26.1%

ON STREET NAME
Categorical

HIGH CARDINALITY
MISSING

Distinct4776
Distinct (%)5.0%
Missing33769
Missing (%)26.3%
Memory size9.1 MiB
BELT PARKWAY
 
2211
LONG ISLAND EXPRESSWAY
 
1248
BROOKLYN QUEENS EXPRESSWAY
 
1204
FDR DRIVE
 
1202
BROADWAY
 
1022
Other values (4771)
87711 

Length

Max length32
Median length32
Mean length32
Min length32

Characters and Unicode

Total characters3027136
Distinct characters70
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1579 ?
Unique (%)1.7%

Sample

1st rowCROSS ISLAND PARKWAY
2nd rowW 57 & 8th Ave
3rd rowCROSS BAY BOULEVARD
4th rowNORTHERN BOULEVARD
5th rowEAST 53 STREET
ValueCountFrequency (%)
BELT PARKWAY 2211
 
1.7%
LONG ISLAND EXPRESSWAY 1248
 
1.0%
BROOKLYN QUEENS EXPRESSWAY 1204
 
0.9%
FDR DRIVE 1202
 
0.9%
BROADWAY 1022
 
0.8%
MAJOR DEEGAN EXPRESSWAY 1012
 
0.8%
GRAND CENTRAL PKWY 996
 
0.8%
CROSS BRONX EXPY 931
 
0.7%
ATLANTIC AVENUE 910
 
0.7%
CROSS ISLAND PARKWAY 880
 
0.7%
Other values (4766)82982
64.6%
(Missing)33769
26.3%
2021-03-15T23:35:57.185420image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
avenue33517
 
15.2%
street25628
 
11.6%
east7620
 
3.5%
parkway6644
 
3.0%
boulevard6578
 
3.0%
expressway6254
 
2.8%
west4947
 
2.2%
road3570
 
1.6%
island2678
 
1.2%
cross2303
 
1.0%
Other values (2586)120719
54.8%

Most occurring characters

ValueCountFrequency (%)
1778461
58.8%
E208548
 
6.9%
A118961
 
3.9%
R104027
 
3.4%
T94495
 
3.1%
N86853
 
2.9%
S82954
 
2.7%
U55287
 
1.8%
O53205
 
1.8%
V48789
 
1.6%
Other values (60)395556
 
13.1%

Most occurring categories

ValueCountFrequency (%)
Space Separator1778461
58.8%
Uppercase Letter1180929
39.0%
Decimal Number60124
 
2.0%
Lowercase Letter6613
 
0.2%
Other Punctuation338
 
< 0.1%
Open Punctuation334
 
< 0.1%
Close Punctuation334
 
< 0.1%
Dash Punctuation3
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
E208548
17.7%
A118961
10.1%
R104027
 
8.8%
T94495
 
8.0%
N86853
 
7.4%
S82954
 
7.0%
U55287
 
4.7%
O53205
 
4.5%
V48789
 
4.1%
L39006
 
3.3%
Other values (16)288804
24.5%
ValueCountFrequency (%)
e1093
16.5%
a652
 
9.9%
t627
 
9.5%
r599
 
9.1%
n448
 
6.8%
s419
 
6.3%
o341
 
5.2%
v263
 
4.0%
u262
 
4.0%
l248
 
3.8%
Other values (16)1661
25.1%
ValueCountFrequency (%)
115057
25.0%
26683
11.1%
36542
10.9%
45249
 
8.7%
55088
 
8.5%
64705
 
7.8%
84696
 
7.8%
74272
 
7.1%
03925
 
6.5%
93907
 
6.5%
ValueCountFrequency (%)
.257
76.0%
/66
 
19.5%
&12
 
3.6%
'3
 
0.9%
ValueCountFrequency (%)
1778461
100.0%
ValueCountFrequency (%)
-3
100.0%
ValueCountFrequency (%)
(334
100.0%
ValueCountFrequency (%)
)334
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common1839594
60.8%
Latin1187542
39.2%

Most frequent character per script

ValueCountFrequency (%)
E208548
17.6%
A118961
10.0%
R104027
 
8.8%
T94495
 
8.0%
N86853
 
7.3%
S82954
 
7.0%
U55287
 
4.7%
O53205
 
4.5%
V48789
 
4.1%
L39006
 
3.3%
Other values (42)295417
24.9%
ValueCountFrequency (%)
1778461
96.7%
115057
 
0.8%
26683
 
0.4%
36542
 
0.4%
45249
 
0.3%
55088
 
0.3%
64705
 
0.3%
84696
 
0.3%
74272
 
0.2%
03925
 
0.2%
Other values (8)4916
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII3027136
100.0%

Most frequent character per block

ValueCountFrequency (%)
1778461
58.8%
E208548
 
6.9%
A118961
 
3.9%
R104027
 
3.4%
T94495
 
3.1%
N86853
 
2.9%
S82954
 
2.7%
U55287
 
1.8%
O53205
 
1.8%
V48789
 
1.6%
Other values (60)395556
 
13.1%

CROSS STREET NAME
Categorical

HIGH CARDINALITY
MISSING

Distinct5270
Distinct (%)8.7%
Missing68115
Missing (%)53.1%
Memory size6.1 MiB
3 AVENUE
 
562
BROADWAY
 
523
2 AVENUE
 
387
LINDEN BOULEVARD
 
346
5 AVENUE
 
319
Other values (5265)
58115 

Length

Max length32
Median length13
Mean length13.18376153
Min length1

Characters and Unicode

Total characters794348
Distinct characters73
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1748 ?
Unique (%)2.9%

Sample

1st rowW 57
2nd rowSOUTH CONDUIT AVENUE
3rd row68 STREET
4th rowMADISON AVENUE
5th rowNORTHERN BOULEVARD
ValueCountFrequency (%)
3 AVENUE562
 
0.4%
BROADWAY523
 
0.4%
2 AVENUE387
 
0.3%
LINDEN BOULEVARD346
 
0.3%
5 AVENUE319
 
0.2%
PARK AVENUE311
 
0.2%
1 AVENUE290
 
0.2%
ATLANTIC AVENUE274
 
0.2%
BRUCKNER BOULEVARD268
 
0.2%
7 AVENUE245
 
0.2%
Other values (5260)56727
44.2%
(Missing)68115
53.1%
2021-03-15T23:35:57.644421image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
avenue26048
 
19.5%
street21239
 
15.9%
east5900
 
4.4%
boulevard3595
 
2.7%
road2682
 
2.0%
west2572
 
1.9%
place1610
 
1.2%
parkway1323
 
1.0%
expressway833
 
0.6%
park675
 
0.5%
Other values (2895)67078
50.2%

Most occurring characters

ValueCountFrequency (%)
E137189
17.3%
73306
 
9.2%
T67505
 
8.5%
A67113
 
8.4%
R54450
 
6.9%
N50480
 
6.4%
S46479
 
5.9%
U36535
 
4.6%
V33232
 
4.2%
O27783
 
3.5%
Other values (63)200276
25.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter663364
83.5%
Space Separator73306
 
9.2%
Decimal Number51042
 
6.4%
Lowercase Letter6605
 
0.8%
Other Punctuation28
 
< 0.1%
Control1
 
< 0.1%
Other Number1
 
< 0.1%
Dash Punctuation1
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
e1232
18.7%
t652
9.9%
a633
9.6%
r532
 
8.1%
n480
 
7.3%
s403
 
6.1%
o303
 
4.6%
u298
 
4.5%
v293
 
4.4%
l249
 
3.8%
Other values (17)1530
23.2%
ValueCountFrequency (%)
E137189
20.7%
T67505
10.2%
A67113
10.1%
R54450
 
8.2%
N50480
 
7.6%
S46479
 
7.0%
U36535
 
5.5%
V33232
 
5.0%
O27783
 
4.2%
L21398
 
3.2%
Other values (16)121200
18.3%
ValueCountFrequency (%)
112085
23.7%
25965
11.7%
35406
10.6%
44635
 
9.1%
54490
 
8.8%
73970
 
7.8%
63878
 
7.6%
83877
 
7.6%
93400
 
6.7%
03336
 
6.5%
ValueCountFrequency (%)
/11
39.3%
.7
25.0%
&6
21.4%
,2
 
7.1%
¿1
 
3.6%
'1
 
3.6%
ValueCountFrequency (%)
73306
100.0%
ValueCountFrequency (%)
1
100.0%
ValueCountFrequency (%)
½1
100.0%
ValueCountFrequency (%)
-1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin669969
84.3%
Common124379
 
15.7%

Most frequent character per script

ValueCountFrequency (%)
E137189
20.5%
T67505
10.1%
A67113
10.0%
R54450
 
8.1%
N50480
 
7.5%
S46479
 
6.9%
U36535
 
5.5%
V33232
 
5.0%
O27783
 
4.1%
L21398
 
3.2%
Other values (43)127805
19.1%
ValueCountFrequency (%)
73306
58.9%
112085
 
9.7%
25965
 
4.8%
35406
 
4.3%
44635
 
3.7%
54490
 
3.6%
73970
 
3.2%
63878
 
3.1%
83877
 
3.1%
93400
 
2.7%
Other values (10)3367
 
2.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII794345
> 99.9%
None3
 
< 0.1%

Most frequent character per block

ValueCountFrequency (%)
E137189
17.3%
73306
 
9.2%
T67505
 
8.5%
A67113
 
8.4%
R54450
 
6.9%
N50480
 
6.4%
S46479
 
5.9%
U36535
 
4.6%
V33232
 
4.2%
O27783
 
3.5%
Other values (60)200273
25.2%
ValueCountFrequency (%)
ï1
33.3%
¿1
33.3%
½1
33.3%

OFF STREET NAME
Categorical

HIGH CARDINALITY
MISSING
UNIFORM

Distinct28958
Distinct (%)85.8%
Missing94598
Missing (%)73.7%
Memory size6.0 MiB
772 EDGEWATER ROAD
 
35
625 ATLANTIC AVENUE
 
22
815 HUTCHINSON RIVER PARKWAY
 
21
501 GATEWAY DRIVE
 
19
355 FOOD CENTER DRIVE
 
19
Other values (28953)
33653 

Length

Max length40
Median length40
Mean length40
Min length40

Characters and Unicode

Total characters1350760
Distinct characters69
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique25915 ?
Unique (%)76.7%

Sample

1st row760 BROADWAY
2nd row948 EAST 179 STREET
3rd row793 FLATBUSH AVENUE
4th row1539 PARK PLACE
5th row1316 UTICA AVENUE
ValueCountFrequency (%)
772 EDGEWATER ROAD 35
 
< 0.1%
625 ATLANTIC AVENUE 22
 
< 0.1%
815 HUTCHINSON RIVER PARKWAY 21
 
< 0.1%
501 GATEWAY DRIVE 19
 
< 0.1%
355 FOOD CENTER DRIVE 19
 
< 0.1%
110-00 ROCKAWAY BOULEVARD 19
 
< 0.1%
450 FLATBUSH AVENUE 18
 
< 0.1%
1400 PELHAM PARKWAY SOUTH 16
 
< 0.1%
63 FLUSHING AVENUE 16
 
< 0.1%
519 GATEWAY DRIVE 16
 
< 0.1%
Other values (28948)33568
 
26.2%
(Missing)94598
73.7%
2021-03-15T23:35:58.367428image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
avenue14010
 
12.8%
street12639
 
11.6%
east3709
 
3.4%
boulevard1929
 
1.8%
west1877
 
1.7%
road1609
 
1.5%
place771
 
0.7%
parkway754
 
0.7%
drive471
 
0.4%
broadway421
 
0.4%
Other values (11343)71237
65.1%

Most occurring characters

ValueCountFrequency (%)
810625
60.0%
E78430
 
5.8%
T41761
 
3.1%
A39169
 
2.9%
R31506
 
2.3%
N28886
 
2.1%
127887
 
2.1%
S27882
 
2.1%
U20076
 
1.5%
218400
 
1.4%
Other values (59)226138
 
16.7%

Most occurring categories

ValueCountFrequency (%)
Space Separator810625
60.0%
Uppercase Letter385680
28.6%
Decimal Number144412
 
10.7%
Dash Punctuation7696
 
0.6%
Lowercase Letter2330
 
0.2%
Other Punctuation15
 
< 0.1%
Control1
 
< 0.1%
Connector Punctuation1
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
E78430
20.3%
T41761
10.8%
A39169
10.2%
R31506
8.2%
N28886
 
7.5%
S27882
 
7.2%
U20076
 
5.2%
V18006
 
4.7%
O16142
 
4.2%
L12350
 
3.2%
Other values (16)71472
18.5%
ValueCountFrequency (%)
e394
16.9%
t244
10.5%
r225
9.7%
a212
 
9.1%
s158
 
6.8%
n143
 
6.1%
o124
 
5.3%
d98
 
4.2%
l96
 
4.1%
v92
 
3.9%
Other values (16)544
23.3%
ValueCountFrequency (%)
127887
19.3%
218400
12.7%
015999
11.1%
314607
10.1%
514488
10.0%
412874
8.9%
610928
 
7.6%
710249
 
7.1%
89931
 
6.9%
99049
 
6.3%
ValueCountFrequency (%)
/7
46.7%
.7
46.7%
!1
 
6.7%
ValueCountFrequency (%)
810625
100.0%
ValueCountFrequency (%)
-7696
100.0%
ValueCountFrequency (%)
1
100.0%
ValueCountFrequency (%)
_1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common962750
71.3%
Latin388010
28.7%

Most frequent character per script

ValueCountFrequency (%)
E78430
20.2%
T41761
10.8%
A39169
10.1%
R31506
8.1%
N28886
 
7.4%
S27882
 
7.2%
U20076
 
5.2%
V18006
 
4.6%
O16142
 
4.2%
L12350
 
3.2%
Other values (42)73802
19.0%
ValueCountFrequency (%)
810625
84.2%
127887
 
2.9%
218400
 
1.9%
015999
 
1.7%
314607
 
1.5%
514488
 
1.5%
412874
 
1.3%
610928
 
1.1%
710249
 
1.1%
89931
 
1.0%
Other values (7)16762
 
1.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII1350760
100.0%

Most frequent character per block

ValueCountFrequency (%)
810625
60.0%
E78430
 
5.8%
T41761
 
3.1%
A39169
 
2.9%
R31506
 
2.3%
N28886
 
2.1%
127887
 
2.1%
S27882
 
2.1%
U20076
 
1.5%
218400
 
1.4%
Other values (59)226138
 
16.7%

NUMBER OF PERSONS INJURED
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct14
Distinct (%)< 0.1%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean0.3942554882
Minimum0
Maximum16
Zeros90759
Zeros (%)70.7%
Memory size1003.0 KiB
2021-03-15T23:35:58.532420image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile2
Maximum16
Range16
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.749183193
Coefficient of variation (CV)1.900247975
Kurtosis16.64754482
Mean0.3942554882
Median Absolute Deviation (MAD)0
Skewness3.029441689
Sum50609
Variance0.5612754567
MonotocityNot monotonic
2021-03-15T23:35:58.690421image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
090759
70.7%
129191
 
22.7%
25576
 
4.3%
31795
 
1.4%
4653
 
0.5%
5243
 
0.2%
672
 
0.1%
739
 
< 0.1%
818
 
< 0.1%
99
 
< 0.1%
Other values (4)11
 
< 0.1%
ValueCountFrequency (%)
090759
70.7%
129191
 
22.7%
25576
 
4.3%
31795
 
1.4%
4653
 
0.5%
5243
 
0.2%
672
 
0.1%
739
 
< 0.1%
818
 
< 0.1%
99
 
< 0.1%
ValueCountFrequency (%)
161
 
< 0.1%
151
 
< 0.1%
113
 
< 0.1%
106
 
< 0.1%
99
 
< 0.1%
818
 
< 0.1%
739
 
< 0.1%
672
 
0.1%
5243
 
0.2%
4653
0.5%

NUMBER OF PERSONS KILLED
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.1 MiB
0
128085 
1
 
269
2
 
10
3
 
2
4
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters128367
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0
ValueCountFrequency (%)
0128085
99.8%
1269
 
0.2%
210
 
< 0.1%
32
 
< 0.1%
41
 
< 0.1%
2021-03-15T23:35:59.089421image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-03-15T23:35:59.207420image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
0128085
99.8%
1269
 
0.2%
210
 
< 0.1%
32
 
< 0.1%
41
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0128085
99.8%
1269
 
0.2%
210
 
< 0.1%
32
 
< 0.1%
41
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number128367
100.0%

Most frequent character per category

ValueCountFrequency (%)
0128085
99.8%
1269
 
0.2%
210
 
< 0.1%
32
 
< 0.1%
41
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common128367
100.0%

Most frequent character per script

ValueCountFrequency (%)
0128085
99.8%
1269
 
0.2%
210
 
< 0.1%
32
 
< 0.1%
41
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII128367
100.0%

Most frequent character per block

ValueCountFrequency (%)
0128085
99.8%
1269
 
0.2%
210
 
< 0.1%
32
 
< 0.1%
41
 
< 0.1%

NUMBER OF PEDESTRIANS INJURED
Real number (ℝ≥0)

ZEROS

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.05964149665
Minimum0
Maximum7
Zeros120992
Zeros (%)94.3%
Memory size1003.0 KiB
2021-03-15T23:35:59.369421image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum7
Range7
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.2480685043
Coefficient of variation (CV)4.159327284
Kurtosis29.04223204
Mean0.05964149665
Median Absolute Deviation (MAD)0
Skewness4.574181181
Sum7656
Variance0.06153798281
MonotocityNot monotonic
2021-03-15T23:35:59.514421image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
0120992
94.3%
17139
 
5.6%
2205
 
0.2%
323
 
< 0.1%
45
 
< 0.1%
71
 
< 0.1%
61
 
< 0.1%
51
 
< 0.1%
ValueCountFrequency (%)
0120992
94.3%
17139
 
5.6%
2205
 
0.2%
323
 
< 0.1%
45
 
< 0.1%
51
 
< 0.1%
61
 
< 0.1%
71
 
< 0.1%
ValueCountFrequency (%)
71
 
< 0.1%
61
 
< 0.1%
51
 
< 0.1%
45
 
< 0.1%
323
 
< 0.1%
2205
 
0.2%
17139
 
5.6%
0120992
94.3%
Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.1 MiB
0
128246 
1
 
120
2
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters128367
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0
ValueCountFrequency (%)
0128246
99.9%
1120
 
0.1%
21
 
< 0.1%
2021-03-15T23:35:59.904369image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-03-15T23:36:00.020369image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
0128246
99.9%
1120
 
0.1%
21
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0128246
99.9%
1120
 
0.1%
21
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number128367
100.0%

Most frequent character per category

ValueCountFrequency (%)
0128246
99.9%
1120
 
0.1%
21
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common128367
100.0%

Most frequent character per script

ValueCountFrequency (%)
0128246
99.9%
1120
 
0.1%
21
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII128367
100.0%

Most frequent character per block

ValueCountFrequency (%)
0128246
99.9%
1120
 
0.1%
21
 
< 0.1%
Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.1 MiB
0
122475 
1
 
5765
2
 
125
3
 
2

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters128367
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0
ValueCountFrequency (%)
0122475
95.4%
15765
 
4.5%
2125
 
0.1%
32
 
< 0.1%
2021-03-15T23:36:00.361368image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-03-15T23:36:00.480367image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
0122475
95.4%
15765
 
4.5%
2125
 
0.1%
32
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0122475
95.4%
15765
 
4.5%
2125
 
0.1%
32
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number128367
100.0%

Most frequent character per category

ValueCountFrequency (%)
0122475
95.4%
15765
 
4.5%
2125
 
0.1%
32
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common128367
100.0%

Most frequent character per script

ValueCountFrequency (%)
0122475
95.4%
15765
 
4.5%
2125
 
0.1%
32
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII128367
100.0%

Most frequent character per block

ValueCountFrequency (%)
0122475
95.4%
15765
 
4.5%
2125
 
0.1%
32
 
< 0.1%
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.1 MiB
0
128337 
1
 
30

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters128367
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0
ValueCountFrequency (%)
0128337
> 99.9%
130
 
< 0.1%
2021-03-15T23:36:00.833420image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-03-15T23:36:00.946420image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
0128337
> 99.9%
130
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0128337
> 99.9%
130
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number128367
100.0%

Most frequent character per category

ValueCountFrequency (%)
0128337
> 99.9%
130
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common128367
100.0%

Most frequent character per script

ValueCountFrequency (%)
0128337
> 99.9%
130
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII128367
100.0%

Most frequent character per block

ValueCountFrequency (%)
0128337
> 99.9%
130
 
< 0.1%

NUMBER OF MOTORIST INJURED
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.2877141321
Minimum0
Maximum16
Zeros103780
Zeros (%)80.8%
Memory size1003.0 KiB
2021-03-15T23:36:01.059421image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum16
Range16
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.7168114962
Coefficient of variation (CV)2.491401764
Kurtosis21.80086343
Mean0.2877141321
Median Absolute Deviation (MAD)0
Skewness3.709884974
Sum36933
Variance0.5138187211
MonotocityNot monotonic
2021-03-15T23:36:01.215420image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
0103780
80.8%
116707
 
13.0%
25111
 
4.0%
31747
 
1.4%
4644
 
0.5%
5235
 
0.2%
670
 
0.1%
737
 
< 0.1%
816
 
< 0.1%
99
 
< 0.1%
Other values (4)11
 
< 0.1%
ValueCountFrequency (%)
0103780
80.8%
116707
 
13.0%
25111
 
4.0%
31747
 
1.4%
4644
 
0.5%
5235
 
0.2%
670
 
0.1%
737
 
< 0.1%
816
 
< 0.1%
99
 
< 0.1%
ValueCountFrequency (%)
161
 
< 0.1%
151
 
< 0.1%
113
 
< 0.1%
106
 
< 0.1%
99
 
< 0.1%
816
 
< 0.1%
737
 
< 0.1%
670
 
0.1%
5235
 
0.2%
4644
0.5%

NUMBER OF MOTORIST KILLED
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.1 MiB
0
128235 
1
 
121
2
 
8
3
 
2
4
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters128367
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0
ValueCountFrequency (%)
0128235
99.9%
1121
 
0.1%
28
 
< 0.1%
32
 
< 0.1%
41
 
< 0.1%
2021-03-15T23:36:01.615374image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-03-15T23:36:01.734368image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
0128235
99.9%
1121
 
0.1%
28
 
< 0.1%
32
 
< 0.1%
41
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0128235
99.9%
1121
 
0.1%
28
 
< 0.1%
32
 
< 0.1%
41
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number128367
100.0%

Most frequent character per category

ValueCountFrequency (%)
0128235
99.9%
1121
 
0.1%
28
 
< 0.1%
32
 
< 0.1%
41
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common128367
100.0%

Most frequent character per script

ValueCountFrequency (%)
0128235
99.9%
1121
 
0.1%
28
 
< 0.1%
32
 
< 0.1%
41
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII128367
100.0%

Most frequent character per block

ValueCountFrequency (%)
0128235
99.9%
1121
 
0.1%
28
 
< 0.1%
32
 
< 0.1%
41
 
< 0.1%

CONTRIBUTING FACTOR VEHICLE 1
Categorical

HIGH CARDINALITY

Distinct55
Distinct (%)< 0.1%
Missing590
Missing (%)0.5%
Memory size9.5 MiB
Unspecified
33811 
Driver Inattention/Distraction
32243 
Following Too Closely
8514 
Failure to Yield Right-of-Way
8092 
Passing or Lane Usage Improper
4821 
Other values (50)
40296 

Length

Max length53
Median length20
Mean length21.02476972
Min length5

Characters and Unicode

Total characters2686482
Distinct characters52
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTire Failure/Inadequate
2nd rowUnspecified
3rd rowDriver Inattention/Distraction
4th rowPedestrian/Bicyclist/Other Pedestrian Error/Confusion
5th rowDriver Inattention/Distraction
ValueCountFrequency (%)
Unspecified33811
26.3%
Driver Inattention/Distraction32243
25.1%
Following Too Closely8514
 
6.6%
Failure to Yield Right-of-Way8092
 
6.3%
Passing or Lane Usage Improper4821
 
3.8%
Passing Too Closely4675
 
3.6%
Backing Unsafely4577
 
3.6%
Other Vehicular3773
 
2.9%
Unsafe Speed3759
 
2.9%
Unsafe Lane Changing2890
 
2.3%
Other values (45)20622
16.1%
2021-03-15T23:36:02.206368image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
driver34488
 
12.4%
unspecified33811
 
12.2%
inattention/distraction32243
 
11.6%
too13189
 
4.8%
closely13189
 
4.8%
to10246
 
3.7%
passing9496
 
3.4%
failure8518
 
3.1%
following8514
 
3.1%
yield8092
 
2.9%
Other values (93)105421
38.0%

Most occurring characters

ValueCountFrequency (%)
i288170
 
10.7%
e264106
 
9.8%
n241513
 
9.0%
t202968
 
7.6%
o172821
 
6.4%
r167803
 
6.2%
149430
 
5.6%
a142479
 
5.3%
s129683
 
4.8%
c91519
 
3.4%
Other values (42)835990
31.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter2174095
80.9%
Uppercase Letter308224
 
11.5%
Space Separator149430
 
5.6%
Other Punctuation38069
 
1.4%
Dash Punctuation16304
 
0.6%
Open Punctuation180
 
< 0.1%
Close Punctuation180
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
i288170
13.3%
e264106
12.1%
n241513
11.1%
t202968
9.3%
o172821
7.9%
r167803
7.7%
a142479
 
6.6%
s129683
 
6.0%
c91519
 
4.2%
l89424
 
4.1%
Other values (15)383609
17.6%
ValueCountFrequency (%)
D72731
23.6%
U51883
16.8%
I44057
14.3%
C20482
 
6.6%
T18707
 
6.1%
F17799
 
5.8%
P13448
 
4.4%
R12321
 
4.0%
L9069
 
2.9%
W8173
 
2.7%
Other values (12)39554
12.8%
ValueCountFrequency (%)
149430
100.0%
ValueCountFrequency (%)
/38069
100.0%
ValueCountFrequency (%)
-16304
100.0%
ValueCountFrequency (%)
(180
100.0%
ValueCountFrequency (%)
)180
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2482319
92.4%
Common204163
 
7.6%

Most frequent character per script

ValueCountFrequency (%)
i288170
11.6%
e264106
 
10.6%
n241513
 
9.7%
t202968
 
8.2%
o172821
 
7.0%
r167803
 
6.8%
a142479
 
5.7%
s129683
 
5.2%
c91519
 
3.7%
l89424
 
3.6%
Other values (37)691833
27.9%
ValueCountFrequency (%)
149430
73.2%
/38069
 
18.6%
-16304
 
8.0%
(180
 
0.1%
)180
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII2686482
100.0%

Most frequent character per block

ValueCountFrequency (%)
i288170
 
10.7%
e264106
 
9.8%
n241513
 
9.0%
t202968
 
7.6%
o172821
 
6.4%
r167803
 
6.2%
149430
 
5.6%
a142479
 
5.3%
s129683
 
4.8%
c91519
 
3.4%
Other values (42)835990
31.1%
Distinct48
Distinct (%)< 0.1%
Missing28507
Missing (%)22.2%
Memory size7.5 MiB
Unspecified
84696 
Driver Inattention/Distraction
 
5936
Other Vehicular
 
1552
Following Too Closely
 
1521
Passing or Lane Usage Improper
 
853
Other values (43)
 
5302

Length

Max length53
Median length11
Mean length13.13994592
Min length5

Characters and Unicode

Total characters1312155
Distinct characters52
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)< 0.1%

Sample

1st rowUnspecified
2nd rowUnspecified
3rd rowDriver Inattention/Distraction
4th rowUnspecified
5th rowOther Vehicular
ValueCountFrequency (%)
Unspecified84696
66.0%
Driver Inattention/Distraction5936
 
4.6%
Other Vehicular1552
 
1.2%
Following Too Closely1521
 
1.2%
Passing or Lane Usage Improper853
 
0.7%
Failure to Yield Right-of-Way812
 
0.6%
Passing Too Closely601
 
0.5%
Unsafe Speed551
 
0.4%
Traffic Control Disregarded487
 
0.4%
Unsafe Lane Changing446
 
0.3%
Other values (38)2405
 
1.9%
(Missing)28507
 
22.2%
2021-03-15T23:36:02.633368image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
unspecified84696
68.7%
driver6201
 
5.0%
inattention/distraction5936
 
4.8%
closely2122
 
1.7%
too2122
 
1.7%
other1567
 
1.3%
vehicular1552
 
1.3%
following1521
 
1.2%
passing1454
 
1.2%
lane1324
 
1.1%
Other values (81)14795
 
12.0%

Most occurring characters

ValueCountFrequency (%)
i206639
15.7%
e201390
15.3%
n119405
9.1%
s100187
7.6%
c94901
7.2%
d88543
6.7%
p88330
6.7%
f88109
6.7%
U87091
6.6%
t36316
 
2.8%
Other values (42)201244
15.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1150613
87.7%
Uppercase Letter129295
 
9.9%
Space Separator23430
 
1.8%
Other Punctuation7131
 
0.5%
Dash Punctuation1662
 
0.1%
Open Punctuation12
 
< 0.1%
Close Punctuation12
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
i206639
18.0%
e201390
17.5%
n119405
10.4%
s100187
8.7%
c94901
8.2%
d88543
7.7%
p88330
7.7%
f88109
7.7%
t36316
 
3.2%
r30681
 
2.7%
Other values (15)96112
8.4%
ValueCountFrequency (%)
U87091
67.4%
D12957
 
10.0%
I7506
 
5.8%
C3374
 
2.6%
T2921
 
2.3%
F2392
 
1.9%
O2223
 
1.7%
P2198
 
1.7%
V2146
 
1.7%
L1556
 
1.2%
Other values (12)4931
 
3.8%
ValueCountFrequency (%)
23430
100.0%
ValueCountFrequency (%)
/7131
100.0%
ValueCountFrequency (%)
-1662
100.0%
ValueCountFrequency (%)
(12
100.0%
ValueCountFrequency (%)
)12
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1279908
97.5%
Common32247
 
2.5%

Most frequent character per script

ValueCountFrequency (%)
i206639
16.1%
e201390
15.7%
n119405
9.3%
s100187
7.8%
c94901
7.4%
d88543
6.9%
p88330
6.9%
f88109
6.9%
U87091
6.8%
t36316
 
2.8%
Other values (37)168997
13.2%
ValueCountFrequency (%)
23430
72.7%
/7131
 
22.1%
-1662
 
5.2%
(12
 
< 0.1%
)12
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII1312155
100.0%

Most frequent character per block

ValueCountFrequency (%)
i206639
15.7%
e201390
15.3%
n119405
9.1%
s100187
7.6%
c94901
7.2%
d88543
6.7%
p88330
6.7%
f88109
6.7%
U87091
6.6%
t36316
 
2.8%
Other values (42)201244
15.3%
Distinct30
Distinct (%)0.2%
Missing116070
Missing (%)90.4%
Memory size4.3 MiB
Unspecified
11497 
Other Vehicular
 
247
Following Too Closely
 
222
Driver Inattention/Distraction
 
165
Pavement Slippery
 
26
Other values (25)
 
140

Length

Max length53
Median length11
Mean length11.66414573
Min length5

Characters and Unicode

Total characters143434
Distinct characters48
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7 ?
Unique (%)0.1%

Sample

1st rowDriver Inattention/Distraction
2nd rowUnspecified
3rd rowUnspecified
4th rowUnspecified
5th rowFollowing Too Closely
ValueCountFrequency (%)
Unspecified11497
 
9.0%
Other Vehicular247
 
0.2%
Following Too Closely222
 
0.2%
Driver Inattention/Distraction165
 
0.1%
Pavement Slippery26
 
< 0.1%
Reaction to Uninvolved Vehicle22
 
< 0.1%
Unsafe Speed18
 
< 0.1%
Unsafe Lane Changing11
 
< 0.1%
Driver Inexperience10
 
< 0.1%
Obstruction/Debris9
 
< 0.1%
Other values (20)70
 
0.1%
(Missing)116070
90.4%
2021-03-15T23:36:03.062367image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
unspecified11497
85.6%
other248
 
1.8%
vehicular247
 
1.8%
too231
 
1.7%
closely231
 
1.7%
following222
 
1.7%
driver175
 
1.3%
inattention/distraction165
 
1.2%
unsafe29
 
0.2%
to28
 
0.2%
Other values (51)357
 
2.7%

Most occurring characters

ValueCountFrequency (%)
e24508
17.1%
i24392
17.0%
n12656
8.8%
s12031
8.4%
c12015
8.4%
p11606
8.1%
d11579
8.1%
U11557
8.1%
f11554
8.1%
o1640
 
1.1%
Other values (38)9896
6.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter128481
89.6%
Uppercase Letter13603
 
9.5%
Space Separator1133
 
0.8%
Other Punctuation201
 
0.1%
Dash Punctuation14
 
< 0.1%
Open Punctuation1
 
< 0.1%
Close Punctuation1
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
e24508
19.1%
i24392
19.0%
n12656
9.9%
s12031
9.4%
c12015
9.4%
p11606
9.0%
d11579
9.0%
f11554
9.0%
o1640
 
1.3%
l1286
 
1.0%
Other values (13)5214
 
4.1%
ValueCountFrequency (%)
U11557
85.0%
D371
 
2.7%
V278
 
2.0%
O268
 
2.0%
C256
 
1.9%
T243
 
1.8%
F229
 
1.7%
I197
 
1.4%
P50
 
0.4%
S45
 
0.3%
Other values (10)109
 
0.8%
ValueCountFrequency (%)
1133
100.0%
ValueCountFrequency (%)
/201
100.0%
ValueCountFrequency (%)
-14
100.0%
ValueCountFrequency (%)
(1
100.0%
ValueCountFrequency (%)
)1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin142084
99.1%
Common1350
 
0.9%

Most frequent character per script

ValueCountFrequency (%)
e24508
17.2%
i24392
17.2%
n12656
8.9%
s12031
8.5%
c12015
8.5%
p11606
8.2%
d11579
8.1%
U11557
8.1%
f11554
8.1%
o1640
 
1.2%
Other values (33)8546
 
6.0%
ValueCountFrequency (%)
1133
83.9%
/201
 
14.9%
-14
 
1.0%
(1
 
0.1%
)1
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII143434
100.0%

Most frequent character per block

ValueCountFrequency (%)
e24508
17.1%
i24392
17.0%
n12656
8.8%
s12031
8.4%
c12015
8.4%
p11606
8.1%
d11579
8.1%
U11557
8.1%
f11554
8.1%
o1640
 
1.1%
Other values (38)9896
6.9%
Distinct16
Distinct (%)0.5%
Missing125009
Missing (%)97.4%
Memory size4.0 MiB
Unspecified
3169 
Other Vehicular
 
76
Following Too Closely
 
49
Driver Inattention/Distraction
 
35
Pavement Slippery
 
7
Other values (11)
 
22

Length

Max length30
Median length11
Mean length11.51488982
Min length11

Characters and Unicode

Total characters38667
Distinct characters39
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)0.1%

Sample

1st rowUnspecified
2nd rowUnspecified
3rd rowUnspecified
4th rowUnspecified
5th rowUnspecified
ValueCountFrequency (%)
Unspecified3169
 
2.5%
Other Vehicular76
 
0.1%
Following Too Closely49
 
< 0.1%
Driver Inattention/Distraction35
 
< 0.1%
Pavement Slippery7
 
< 0.1%
Reaction to Uninvolved Vehicle4
 
< 0.1%
Unsafe Speed3
 
< 0.1%
Driver Inexperience3
 
< 0.1%
Outside Car Distraction2
 
< 0.1%
Obstruction/Debris2
 
< 0.1%
Other values (6)8
 
< 0.1%
(Missing)125009
97.4%
2021-03-15T23:36:03.481421image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
unspecified3169
87.8%
vehicular76
 
2.1%
other76
 
2.1%
closely51
 
1.4%
too51
 
1.4%
following49
 
1.4%
driver38
 
1.1%
inattention/distraction35
 
1.0%
slippery7
 
0.2%
pavement7
 
0.2%
Other values (25)52
 
1.4%

Most occurring characters

ValueCountFrequency (%)
e6692
17.3%
i6649
17.2%
n3400
8.8%
c3298
8.5%
s3279
8.5%
p3191
8.3%
d3180
8.2%
U3178
8.2%
f3174
8.2%
o344
 
0.9%
Other values (29)2282
 
5.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter34730
89.8%
Uppercase Letter3645
 
9.4%
Space Separator253
 
0.7%
Other Punctuation39
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
e6692
19.3%
i6649
19.1%
n3400
9.8%
c3298
9.5%
s3279
9.4%
p3191
9.2%
d3180
9.2%
f3174
9.1%
o344
 
1.0%
l295
 
0.8%
Other values (13)1228
 
3.5%
ValueCountFrequency (%)
U3178
87.2%
D80
 
2.2%
O80
 
2.2%
V80
 
2.2%
C53
 
1.5%
T51
 
1.4%
F49
 
1.3%
I40
 
1.1%
P10
 
0.3%
S10
 
0.3%
Other values (4)14
 
0.4%
ValueCountFrequency (%)
253
100.0%
ValueCountFrequency (%)
/39
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin38375
99.2%
Common292
 
0.8%

Most frequent character per script

ValueCountFrequency (%)
e6692
17.4%
i6649
17.3%
n3400
8.9%
c3298
8.6%
s3279
8.5%
p3191
8.3%
d3180
8.3%
U3178
8.3%
f3174
8.3%
o344
 
0.9%
Other values (27)1990
 
5.2%
ValueCountFrequency (%)
253
86.6%
/39
 
13.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII38667
100.0%

Most frequent character per block

ValueCountFrequency (%)
e6692
17.3%
i6649
17.2%
n3400
8.8%
c3298
8.5%
s3279
8.5%
p3191
8.3%
d3180
8.2%
U3178
8.2%
f3174
8.2%
o344
 
0.9%
Other values (29)2282
 
5.9%
Distinct10
Distinct (%)1.0%
Missing127377
Missing (%)99.2%
Memory size4.0 MiB
Unspecified
926 
Other Vehicular
 
29
Following Too Closely
 
17
Driver Inattention/Distraction
 
6
Pavement Slippery
 
5
Other values (5)
 
7

Length

Max length30
Median length11
Mean length11.48989899
Min length11

Characters and Unicode

Total characters11375
Distinct characters34
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)0.3%

Sample

1st rowUnspecified
2nd rowUnspecified
3rd rowUnspecified
4th rowUnspecified
5th rowUnspecified
ValueCountFrequency (%)
Unspecified926
 
0.7%
Other Vehicular29
 
< 0.1%
Following Too Closely17
 
< 0.1%
Driver Inattention/Distraction6
 
< 0.1%
Pavement Slippery5
 
< 0.1%
Obstruction/Debris2
 
< 0.1%
Outside Car Distraction2
 
< 0.1%
Passing Too Closely1
 
< 0.1%
Driver Inexperience1
 
< 0.1%
Unsafe Speed1
 
< 0.1%
(Missing)127377
99.2%
2021-03-15T23:36:03.849369image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-03-15T23:36:03.990368image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
unspecified926
86.4%
vehicular29
 
2.7%
other29
 
2.7%
closely18
 
1.7%
too18
 
1.7%
following17
 
1.6%
driver7
 
0.7%
inattention/distraction6
 
0.6%
slippery5
 
0.5%
pavement5
 
0.5%
Other values (8)12
 
1.1%

Most occurring characters

ValueCountFrequency (%)
e1967
17.3%
i1940
17.1%
n980
8.6%
c966
8.5%
s961
8.4%
p938
8.2%
d929
8.2%
U927
8.1%
f927
8.1%
l104
 
0.9%
Other values (24)736
 
6.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter10205
89.7%
Uppercase Letter1080
 
9.5%
Space Separator82
 
0.7%
Other Punctuation8
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
e1967
19.3%
i1940
19.0%
n980
9.6%
c966
9.5%
s961
9.4%
p938
9.2%
d929
9.1%
f927
9.1%
l104
 
1.0%
o104
 
1.0%
Other values (12)389
 
3.8%
ValueCountFrequency (%)
U927
85.8%
O33
 
3.1%
V29
 
2.7%
C20
 
1.9%
T18
 
1.7%
F17
 
1.6%
D17
 
1.6%
I7
 
0.6%
P6
 
0.6%
S6
 
0.6%
ValueCountFrequency (%)
82
100.0%
ValueCountFrequency (%)
/8
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin11285
99.2%
Common90
 
0.8%

Most frequent character per script

ValueCountFrequency (%)
e1967
17.4%
i1940
17.2%
n980
8.7%
c966
8.6%
s961
8.5%
p938
8.3%
d929
8.2%
U927
8.2%
f927
8.2%
l104
 
0.9%
Other values (22)646
 
5.7%
ValueCountFrequency (%)
82
91.1%
/8
 
8.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII11375
100.0%

Most frequent character per block

ValueCountFrequency (%)
e1967
17.3%
i1940
17.1%
n980
8.6%
c966
8.5%
s961
8.4%
p938
8.2%
d929
8.2%
U927
8.1%
f927
8.1%
l104
 
0.9%
Other values (24)736
 
6.5%

COLLISION_ID
Real number (ℝ≥0)

UNIQUE

Distinct128367
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4332874.476
Minimum4063247
Maximum4397407
Zeros0
Zeros (%)0.0%
Memory size1003.0 KiB
2021-03-15T23:36:04.372369image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum4063247
5-th percentile4274921.3
Q14300790.5
median4332905
Q34365013.5
95-th percentile4390704.7
Maximum4397407
Range334160
Interquartile range (IQR)64223

Descriptive statistics

Standard deviation37134.97788
Coefficient of variation (CV)0.008570517814
Kurtosis-1.159355039
Mean4332874.476
Median Absolute Deviation (MAD)32112
Skewness-0.008462197276
Sum5.561980978 × 1011
Variance1379006582
MonotocityNot monotonic
2021-03-15T23:36:04.593369image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
42700431
 
< 0.1%
43227081
 
< 0.1%
42796831
 
< 0.1%
42899241
 
< 0.1%
42919731
 
< 0.1%
42858301
 
< 0.1%
42878791
 
< 0.1%
42735481
 
< 0.1%
42755971
 
< 0.1%
42694541
 
< 0.1%
Other values (128357)128357
> 99.9%
ValueCountFrequency (%)
40632471
< 0.1%
40738031
< 0.1%
42677001
< 0.1%
42677321
< 0.1%
42678231
< 0.1%
42678391
< 0.1%
42678511
< 0.1%
42678641
< 0.1%
42678651
< 0.1%
42678681
< 0.1%
ValueCountFrequency (%)
43974071
< 0.1%
43974051
< 0.1%
43974041
< 0.1%
43974031
< 0.1%
43974011
< 0.1%
43973961
< 0.1%
43973951
< 0.1%
43973941
< 0.1%
43973901
< 0.1%
43973861
< 0.1%

VEHICLE TYPE CODE 1
Categorical

HIGH CARDINALITY

Distinct414
Distinct (%)0.3%
Missing1259
Missing (%)1.0%
Memory size8.9 MiB
Sedan
60122 
Station Wagon/Sport Utility Vehicle
46343 
Taxi
 
4122
Pick-up Truck
 
2996
Box Truck
 
2353
Other values (409)
11172 

Length

Max length35
Median length5
Mean length16.43344243
Min length1

Characters and Unicode

Total characters2088822
Distinct characters65
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique261 ?
Unique (%)0.2%

Sample

1st rowSedan
2nd rowTaxi
3rd rowStation Wagon/Sport Utility Vehicle
4th rowSedan
5th rowStation Wagon/Sport Utility Vehicle
ValueCountFrequency (%)
Sedan60122
46.8%
Station Wagon/Sport Utility Vehicle46343
36.1%
Taxi4122
 
3.2%
Pick-up Truck2996
 
2.3%
Box Truck2353
 
1.8%
Bus1725
 
1.3%
Bike1570
 
1.2%
Tractor Truck Diesel1038
 
0.8%
Motorcycle828
 
0.6%
Van766
 
0.6%
Other values (404)5245
 
4.1%
(Missing)1259
 
1.0%
2021-03-15T23:36:05.118367image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
sedan60336
21.8%
vehicle46358
16.8%
utility46355
16.8%
station46343
16.8%
wagon/sport46343
16.8%
truck6833
 
2.5%
taxi4122
 
1.5%
pick-up2997
 
1.1%
box2369
 
0.9%
bus1760
 
0.6%
Other values (317)12413
 
4.5%

Most occurring characters

ValueCountFrequency (%)
t235245
11.3%
i196355
 
9.4%
e161903
 
7.8%
a161353
 
7.7%
n155216
 
7.4%
S153436
 
7.3%
149123
 
7.1%
o146949
 
7.0%
l96430
 
4.6%
d61058
 
2.9%
Other values (55)571754
27.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1563205
74.8%
Uppercase Letter325896
 
15.6%
Space Separator149123
 
7.1%
Other Punctuation46477
 
2.2%
Dash Punctuation3844
 
0.2%
Decimal Number277
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
S153436
47.1%
V47198
 
14.5%
U46705
 
14.3%
W46518
 
14.3%
T12518
 
3.8%
B6525
 
2.0%
P3410
 
1.0%
D1553
 
0.5%
M1479
 
0.5%
A1234
 
0.4%
Other values (16)5320
 
1.6%
ValueCountFrequency (%)
t235245
15.0%
i196355
12.6%
e161903
10.4%
a161353
10.3%
n155216
9.9%
o146949
9.4%
l96430
6.2%
d61058
 
3.9%
c60415
 
3.9%
r59030
 
3.8%
Other values (14)229251
14.7%
ValueCountFrequency (%)
4199
71.8%
332
 
11.6%
219
 
6.9%
18
 
2.9%
56
 
2.2%
05
 
1.8%
73
 
1.1%
83
 
1.1%
62
 
0.7%
ValueCountFrequency (%)
/46472
> 99.9%
#2
 
< 0.1%
.2
 
< 0.1%
,1
 
< 0.1%
ValueCountFrequency (%)
149123
100.0%
ValueCountFrequency (%)
-3844
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1889101
90.4%
Common199721
 
9.6%

Most frequent character per script

ValueCountFrequency (%)
t235245
12.5%
i196355
10.4%
e161903
 
8.6%
a161353
 
8.5%
n155216
 
8.2%
S153436
 
8.1%
o146949
 
7.8%
l96430
 
5.1%
d61058
 
3.2%
c60415
 
3.2%
Other values (40)460741
24.4%
ValueCountFrequency (%)
149123
74.7%
/46472
 
23.3%
-3844
 
1.9%
4199
 
0.1%
332
 
< 0.1%
219
 
< 0.1%
18
 
< 0.1%
56
 
< 0.1%
05
 
< 0.1%
73
 
< 0.1%
Other values (5)10
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII2088822
100.0%

Most frequent character per block

ValueCountFrequency (%)
t235245
11.3%
i196355
 
9.4%
e161903
 
7.8%
a161353
 
7.7%
n155216
 
7.4%
S153436
 
7.3%
149123
 
7.1%
o146949
 
7.0%
l96430
 
4.6%
d61058
 
2.9%
Other values (55)571754
27.4%

VEHICLE TYPE CODE 2
Categorical

HIGH CARDINALITY
MISSING

Distinct427
Distinct (%)0.5%
Missing39742
Missing (%)31.0%
Memory size7.4 MiB
Sedan
38551 
Station Wagon/Sport Utility Vehicle
29555 
Bike
4082 
Box Truck
 
2575
Pick-up Truck
 
2413
Other values (422)
11449 

Length

Max length38
Median length5
Mean length15.6534725
Min length2

Characters and Unicode

Total characters1387289
Distinct characters63
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique275 ?
Unique (%)0.3%

Sample

1st rowPick-up Truck
2nd rowTaxi
3rd rowSedan
4th rowStation Wagon/Sport Utility Vehicle
5th rowSedan
ValueCountFrequency (%)
Sedan38551
30.0%
Station Wagon/Sport Utility Vehicle29555
23.0%
Bike4082
 
3.2%
Box Truck2575
 
2.0%
Pick-up Truck2413
 
1.9%
Taxi2275
 
1.8%
Bus1466
 
1.1%
Tractor Truck Diesel994
 
0.8%
E-Scooter723
 
0.6%
Motorcycle717
 
0.6%
Other values (417)5274
 
4.1%
(Missing)39742
31.0%
2021-03-15T23:36:05.879420image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
sedan38680
20.7%
vehicle29569
15.8%
utility29560
15.8%
wagon/sport29555
15.8%
station29555
15.8%
truck6419
 
3.4%
bike4088
 
2.2%
box2592
 
1.4%
pick-up2414
 
1.3%
taxi2275
 
1.2%
Other values (323)12042
 
6.4%

Most occurring characters

ValueCountFrequency (%)
t151455
 
10.9%
i129566
 
9.3%
e109191
 
7.9%
a103687
 
7.5%
n99471
 
7.2%
S98626
 
7.1%
98126
 
7.1%
o97153
 
7.0%
l62267
 
4.5%
c42325
 
3.1%
Other values (53)395422
28.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1034963
74.6%
Uppercase Letter220429
 
15.9%
Space Separator98126
 
7.1%
Other Punctuation29676
 
2.1%
Dash Punctuation3919
 
0.3%
Decimal Number175
 
< 0.1%
Modifier Symbol1
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
S98626
44.7%
V30279
 
13.7%
U29876
 
13.6%
W29745
 
13.5%
T10256
 
4.7%
B9323
 
4.2%
P2783
 
1.3%
E1825
 
0.8%
D1521
 
0.7%
M1355
 
0.6%
Other values (16)4840
 
2.2%
ValueCountFrequency (%)
t151455
14.6%
i129566
12.5%
e109191
10.6%
a103687
10.0%
n99471
9.6%
o97153
9.4%
l62267
 
6.0%
c42325
 
4.1%
r41728
 
4.0%
d39386
 
3.8%
Other values (14)158734
15.3%
ValueCountFrequency (%)
4124
70.9%
326
 
14.9%
210
 
5.7%
05
 
2.9%
15
 
2.9%
63
 
1.7%
81
 
0.6%
51
 
0.6%
ValueCountFrequency (%)
/29675
> 99.9%
.1
 
< 0.1%
ValueCountFrequency (%)
-3919
100.0%
ValueCountFrequency (%)
98126
100.0%
ValueCountFrequency (%)
`1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1255392
90.5%
Common131897
 
9.5%

Most frequent character per script

ValueCountFrequency (%)
t151455
12.1%
i129566
10.3%
e109191
 
8.7%
a103687
 
8.3%
n99471
 
7.9%
S98626
 
7.9%
o97153
 
7.7%
l62267
 
5.0%
c42325
 
3.4%
r41728
 
3.3%
Other values (40)319923
25.5%
ValueCountFrequency (%)
98126
74.4%
/29675
 
22.5%
-3919
 
3.0%
4124
 
0.1%
326
 
< 0.1%
210
 
< 0.1%
05
 
< 0.1%
15
 
< 0.1%
63
 
< 0.1%
81
 
< 0.1%
Other values (3)3
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII1387289
100.0%

Most frequent character per block

ValueCountFrequency (%)
t151455
 
10.9%
i129566
 
9.3%
e109191
 
7.9%
a103687
 
7.5%
n99471
 
7.2%
S98626
 
7.1%
98126
 
7.1%
o97153
 
7.0%
l62267
 
4.5%
c42325
 
3.1%
Other values (53)395422
28.5%

VEHICLE TYPE CODE 3
Categorical

HIGH CARDINALITY
MISSING

Distinct77
Distinct (%)0.7%
Missing116772
Missing (%)91.0%
Memory size4.4 MiB
Sedan
5728 
Station Wagon/Sport Utility Vehicle
4788 
Pick-up Truck
 
264
Taxi
 
196
Box Truck
 
118
Other values (72)
 
501

Length

Max length35
Median length5
Mean length17.74877102
Min length2

Characters and Unicode

Total characters205797
Distinct characters53
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique38 ?
Unique (%)0.3%

Sample

1st rowPick-up Truck
2nd rowSedan
3rd rowStation Wagon/Sport Utility Vehicle
4th rowSedan
5th rowPick-up Truck
ValueCountFrequency (%)
Sedan5728
 
4.5%
Station Wagon/Sport Utility Vehicle4788
 
3.7%
Pick-up Truck264
 
0.2%
Taxi196
 
0.2%
Box Truck118
 
0.1%
Bus67
 
0.1%
Tractor Truck Diesel60
 
< 0.1%
Van54
 
< 0.1%
Bike51
 
< 0.1%
Motorcycle36
 
< 0.1%
Other values (67)233
 
0.2%
(Missing)116772
91.0%
2021-03-15T23:36:06.347420image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
sedan5741
21.6%
vehicle4791
18.0%
station4788
18.0%
utility4788
18.0%
wagon/sport4788
18.0%
truck464
 
1.7%
pick-up264
 
1.0%
taxi196
 
0.7%
box121
 
0.5%
bus69
 
0.3%
Other values (79)596
 
2.2%

Most occurring characters

ValueCountFrequency (%)
t24135
11.7%
i19797
 
9.6%
e15740
 
7.6%
a15719
 
7.6%
n15438
 
7.5%
S15326
 
7.4%
15011
 
7.3%
o14744
 
7.2%
l9766
 
4.7%
d5782
 
2.8%
Other values (43)54339
26.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter154137
74.9%
Uppercase Letter31549
 
15.3%
Space Separator15011
 
7.3%
Other Punctuation4796
 
2.3%
Dash Punctuation288
 
0.1%
Decimal Number16
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
t24135
15.7%
i19797
12.8%
e15740
10.2%
a15719
10.2%
n15438
10.0%
o14744
9.6%
l9766
6.3%
d5782
 
3.8%
c5692
 
3.7%
r5560
 
3.6%
Other values (14)21764
14.1%
ValueCountFrequency (%)
S15326
48.6%
V4852
 
15.4%
U4811
 
15.2%
W4804
 
15.2%
T750
 
2.4%
P287
 
0.9%
B270
 
0.9%
D76
 
0.2%
C66
 
0.2%
M65
 
0.2%
Other values (13)242
 
0.8%
ValueCountFrequency (%)
413
81.2%
32
 
12.5%
21
 
6.2%
ValueCountFrequency (%)
-288
100.0%
ValueCountFrequency (%)
15011
100.0%
ValueCountFrequency (%)
/4796
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin185686
90.2%
Common20111
 
9.8%

Most frequent character per script

ValueCountFrequency (%)
t24135
13.0%
i19797
10.7%
e15740
 
8.5%
a15719
 
8.5%
n15438
 
8.3%
S15326
 
8.3%
o14744
 
7.9%
l9766
 
5.3%
d5782
 
3.1%
c5692
 
3.1%
Other values (37)43547
23.5%
ValueCountFrequency (%)
15011
74.6%
/4796
 
23.8%
-288
 
1.4%
413
 
0.1%
32
 
< 0.1%
21
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII205797
100.0%

Most frequent character per block

ValueCountFrequency (%)
t24135
11.7%
i19797
 
9.6%
e15740
 
7.6%
a15719
 
7.6%
n15438
 
7.5%
S15326
 
7.4%
15011
 
7.3%
o14744
 
7.2%
l9766
 
4.7%
d5782
 
2.8%
Other values (43)54339
26.4%

VEHICLE TYPE CODE 4
Categorical

MISSING

Distinct33
Distinct (%)1.0%
Missing125156
Missing (%)97.5%
Memory size4.0 MiB
Sedan
1660 
Station Wagon/Sport Utility Vehicle
1319 
Pick-up Truck
 
73
Taxi
 
47
Box Truck
 
16
Other values (28)
 
96

Length

Max length35
Median length5
Mean length17.58704453
Min length2

Characters and Unicode

Total characters56472
Distinct characters48
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique13 ?
Unique (%)0.4%

Sample

1st rowStation Wagon/Sport Utility Vehicle
2nd rowSedan
3rd rowSedan
4th rowPick-up Truck
5th rowSedan
ValueCountFrequency (%)
Sedan1660
 
1.3%
Station Wagon/Sport Utility Vehicle1319
 
1.0%
Pick-up Truck73
 
0.1%
Taxi47
 
< 0.1%
Box Truck16
 
< 0.1%
Bus15
 
< 0.1%
Motorcycle9
 
< 0.1%
Van9
 
< 0.1%
Convertible9
 
< 0.1%
Bike8
 
< 0.1%
Other values (23)46
 
< 0.1%
(Missing)125156
97.5%
2021-03-15T23:36:06.792374image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
sedan1662
22.8%
utility1319
18.1%
vehicle1319
18.1%
station1319
18.1%
wagon/sport1319
18.1%
truck97
 
1.3%
pick-up74
 
1.0%
taxi47
 
0.6%
box17
 
0.2%
bus15
 
0.2%
Other values (32)101
 
1.4%

Most occurring characters

ValueCountFrequency (%)
t6629
11.7%
i5427
 
9.6%
a4380
 
7.8%
e4367
 
7.7%
n4324
 
7.7%
S4301
 
7.6%
4078
 
7.2%
o4022
 
7.1%
l2670
 
4.7%
d1667
 
3.0%
Other values (38)14607
25.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter42351
75.0%
Uppercase Letter8644
 
15.3%
Space Separator4078
 
7.2%
Other Punctuation1319
 
2.3%
Dash Punctuation78
 
0.1%
Decimal Number2
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
S4301
49.8%
V1331
 
15.4%
U1322
 
15.3%
W1319
 
15.3%
T156
 
1.8%
P79
 
0.9%
B43
 
0.5%
C17
 
0.2%
D16
 
0.2%
M12
 
0.1%
Other values (12)48
 
0.6%
ValueCountFrequency (%)
t6629
15.7%
i5427
12.8%
a4380
10.3%
e4367
10.3%
n4324
10.2%
o4022
9.5%
l2670
6.3%
d1667
 
3.9%
c1520
 
3.6%
r1469
 
3.5%
Other values (12)5876
13.9%
ValueCountFrequency (%)
4078
100.0%
ValueCountFrequency (%)
/1319
100.0%
ValueCountFrequency (%)
-78
100.0%
ValueCountFrequency (%)
42
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin50995
90.3%
Common5477
 
9.7%

Most frequent character per script

ValueCountFrequency (%)
t6629
13.0%
i5427
10.6%
a4380
 
8.6%
e4367
 
8.6%
n4324
 
8.5%
S4301
 
8.4%
o4022
 
7.9%
l2670
 
5.2%
d1667
 
3.3%
c1520
 
3.0%
Other values (34)11688
22.9%
ValueCountFrequency (%)
4078
74.5%
/1319
 
24.1%
-78
 
1.4%
42
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII56472
100.0%

Most frequent character per block

ValueCountFrequency (%)
t6629
11.7%
i5427
 
9.6%
a4380
 
7.8%
e4367
 
7.7%
n4324
 
7.7%
S4301
 
7.6%
4078
 
7.2%
o4022
 
7.1%
l2670
 
4.7%
d1667
 
3.0%
Other values (38)14607
25.9%

VEHICLE TYPE CODE 5
Categorical

MISSING

Distinct19
Distinct (%)2.0%
Missing127407
Missing (%)99.3%
Memory size4.0 MiB
Sedan
484 
Station Wagon/Sport Utility Vehicle
399 
Pick-up Truck
 
22
Taxi
 
14
Van
 
9
Other values (14)
 
32

Length

Max length35
Median length5
Mean length17.734375
Min length2

Characters and Unicode

Total characters17025
Distinct characters42
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7 ?
Unique (%)0.7%

Sample

1st rowStation Wagon/Sport Utility Vehicle
2nd rowSedan
3rd rowSedan
4th rowStation Wagon/Sport Utility Vehicle
5th rowStation Wagon/Sport Utility Vehicle
ValueCountFrequency (%)
Sedan484
 
0.4%
Station Wagon/Sport Utility Vehicle399
 
0.3%
Pick-up Truck22
 
< 0.1%
Taxi14
 
< 0.1%
Van9
 
< 0.1%
PK5
 
< 0.1%
Box Truck5
 
< 0.1%
Motorcycle4
 
< 0.1%
Bus3
 
< 0.1%
Tractor Truck Diesel3
 
< 0.1%
Other values (9)12
 
< 0.1%
(Missing)127407
99.3%
2021-03-15T23:36:07.209420image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
sedan484
22.1%
station399
18.2%
vehicle399
18.2%
utility399
18.2%
wagon/sport399
18.2%
truck32
 
1.5%
pick-up22
 
1.0%
taxi14
 
0.6%
van10
 
0.5%
box7
 
0.3%
Other values (12)28
 
1.3%

Most occurring characters

ValueCountFrequency (%)
t2012
11.8%
i1640
 
9.6%
a1309
 
7.7%
e1303
 
7.7%
n1294
 
7.6%
S1283
 
7.5%
1233
 
7.2%
o1227
 
7.2%
l809
 
4.8%
d484
 
2.8%
Other values (32)4431
26.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter12767
75.0%
Uppercase Letter2603
 
15.3%
Space Separator1233
 
7.2%
Other Punctuation399
 
2.3%
Dash Punctuation23
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
t2012
15.8%
i1640
12.8%
a1309
10.3%
e1303
10.2%
n1294
10.1%
o1227
9.6%
l809
6.3%
d484
 
3.8%
c467
 
3.7%
r451
 
3.5%
Other values (11)1771
13.9%
ValueCountFrequency (%)
S1283
49.3%
V409
 
15.7%
W399
 
15.3%
U399
 
15.3%
T48
 
1.8%
P27
 
1.0%
B11
 
0.4%
M6
 
0.2%
K5
 
0.2%
D5
 
0.2%
Other values (8)11
 
0.4%
ValueCountFrequency (%)
1233
100.0%
ValueCountFrequency (%)
/399
100.0%
ValueCountFrequency (%)
-23
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin15370
90.3%
Common1655
 
9.7%

Most frequent character per script

ValueCountFrequency (%)
t2012
13.1%
i1640
10.7%
a1309
 
8.5%
e1303
 
8.5%
n1294
 
8.4%
S1283
 
8.3%
o1227
 
8.0%
l809
 
5.3%
d484
 
3.1%
c467
 
3.0%
Other values (29)3542
23.0%
ValueCountFrequency (%)
1233
74.5%
/399
 
24.1%
-23
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII17025
100.0%

Most frequent character per block

ValueCountFrequency (%)
t2012
11.8%
i1640
 
9.6%
a1309
 
7.7%
e1303
 
7.7%
n1294
 
7.6%
S1283
 
7.5%
1233
 
7.2%
o1227
 
7.2%
l809
 
4.8%
d484
 
2.8%
Other values (32)4431
26.0%

Interactions

2021-03-15T23:35:34.134857image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:34.426029image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:34.689976image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:34.950029image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:35.209029image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:35.475029image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:35.746975image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:36.001040image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:36.243029image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:36.489029image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:36.733975image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:36.993978image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:37.236976image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:37.515028image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:37.761975image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:38.035976image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:38.308053image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:38.589100image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:38.858048image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:39.250100image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:39.511100image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:39.787100image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:40.063100image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:40.358107image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:40.632101image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:40.909100image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:41.158047image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:41.436049image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:41.714047image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:42.005100image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:42.271100image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:42.548100image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:42.809049image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:43.089101image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:43.370049image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:43.655099image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:43.930101image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:44.213048image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:44.472100image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:44.749048image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:45.030048image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-15T23:35:45.308101image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2021-03-15T23:36:07.396368image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-03-15T23:36:07.801421image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-03-15T23:36:08.201369image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-03-15T23:36:08.628421image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-03-15T23:36:09.186431image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-03-15T23:35:46.461111image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-03-15T23:35:49.423367image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-03-15T23:35:51.657420image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-03-15T23:35:52.726368image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

CRASH DATECRASH TIMEBOROUGHZIP CODELATITUDELONGITUDELOCATIONON STREET NAMECROSS STREET NAMEOFF STREET NAMENUMBER OF PERSONS INJUREDNUMBER OF PERSONS KILLEDNUMBER OF PEDESTRIANS INJUREDNUMBER OF PEDESTRIANS KILLEDNUMBER OF CYCLIST INJUREDNUMBER OF CYCLIST KILLEDNUMBER OF MOTORIST INJUREDNUMBER OF MOTORIST KILLEDCONTRIBUTING FACTOR VEHICLE 1CONTRIBUTING FACTOR VEHICLE 2CONTRIBUTING FACTOR VEHICLE 3CONTRIBUTING FACTOR VEHICLE 4CONTRIBUTING FACTOR VEHICLE 5COLLISION_IDVEHICLE TYPE CODE 1VEHICLE TYPE CODE 2VEHICLE TYPE CODE 3VEHICLE TYPE CODE 4VEHICLE TYPE CODE 5
001/02/20200:00NaNNaNNaNNaNNaNCROSS ISLAND PARKWAYNaNNaN0.00000000Tire Failure/InadequateNaNNaNNaNNaN4267700SedanNaNNaNNaNNaN
101/02/202012:57NaNNaNNaNNaNNaNW 57 & 8th AveW 57NaN0.00000000UnspecifiedUnspecifiedNaNNaNNaN4268255TaxiPick-up TruckNaNNaNNaN
201/02/202015:00NaNNaN40.668266-73.842140(40.668266, -73.84214)CROSS BAY BOULEVARDSOUTH CONDUIT AVENUENaN0.00000000Driver Inattention/DistractionUnspecifiedNaNNaNNaN4268222Station Wagon/Sport Utility VehicleTaxiNaNNaNNaN
301/02/202015:10BROOKLYN11206.040.700527-73.941610(40.700527, -73.94161)NaNNaN760 BROADWAY1.00100000Pedestrian/Bicyclist/Other Pedestrian Error/ConfusionNaNNaNNaNNaN4268246SedanNaNNaNNaNNaN
401/02/202017:30NaNNaNNaNNaNNaNNORTHERN BOULEVARD68 STREETNaN1.00000010Driver Inattention/DistractionDriver Inattention/DistractionNaNNaNNaN4268708Station Wagon/Sport Utility VehicleSedanNaNNaNNaN
501/02/202020:45BRONX10460.040.843033-73.881805(40.843033, -73.881805)NaNNaN948 EAST 179 STREET0.00000000Passing Too CloselyUnspecifiedNaNNaNNaN4268164Station Wagon/Sport Utility VehicleStation Wagon/Sport Utility VehicleNaNNaNNaN
601/02/202010:10MANHATTAN10022.040.759740-73.974230(40.75974, -73.97423)EAST 53 STREETMADISON AVENUENaN0.00000000Other VehicularOther VehicularNaNNaNNaN4268253SedanSedanNaNNaNNaN
701/02/202017:18NaNNaN40.749550-74.006540(40.74955, -74.00654)11 AVENUENaNNaN0.00000000Passing or Lane Usage ImproperUnspecifiedNaNNaNNaN4268097SedanStation Wagon/Sport Utility VehicleNaNNaNNaN
801/02/202018:50NaNNaN40.811638-73.931600(40.811638, -73.9316)MAJOR DEEGAN EXPRESSWAYNaNNaN0.00000000Unsafe SpeedReaction to Uninvolved VehicleNaNNaNNaN4268521SedanSedanNaNNaNNaN
901/02/202013:00BROOKLYN11226.040.653328-73.959404(40.653328, -73.959404)NaNNaN793 FLATBUSH AVENUE0.00000000UnspecifiedNaNNaNNaNNaN4268069SedanNaNNaNNaNNaN

Last rows

CRASH DATECRASH TIMEBOROUGHZIP CODELATITUDELONGITUDELOCATIONON STREET NAMECROSS STREET NAMEOFF STREET NAMENUMBER OF PERSONS INJUREDNUMBER OF PERSONS KILLEDNUMBER OF PEDESTRIANS INJUREDNUMBER OF PEDESTRIANS KILLEDNUMBER OF CYCLIST INJUREDNUMBER OF CYCLIST KILLEDNUMBER OF MOTORIST INJUREDNUMBER OF MOTORIST KILLEDCONTRIBUTING FACTOR VEHICLE 1CONTRIBUTING FACTOR VEHICLE 2CONTRIBUTING FACTOR VEHICLE 3CONTRIBUTING FACTOR VEHICLE 4CONTRIBUTING FACTOR VEHICLE 5COLLISION_IDVEHICLE TYPE CODE 1VEHICLE TYPE CODE 2VEHICLE TYPE CODE 3VEHICLE TYPE CODE 4VEHICLE TYPE CODE 5
12835703/06/202123:50MANHATTAN10013.040.721350-74.004650(40.72135, -74.00465)CANAL STREETWEST BROADWAYNaN0.00000000UnspecifiedUnspecifiedNaNNaNNaN4396733SedanStation Wagon/Sport Utility VehicleNaNNaNNaN
12835803/06/20219:15BROOKLYN11218.040.649940-73.974010(40.64994, -73.97401)NaNNaN31 OCEAN PARKWAY0.00000000Other VehicularUnspecifiedNaNNaNNaN4396709SedanNaNNaNNaNNaN
12835903/06/202115:21MANHATTAN10024.040.783974-73.970310(40.783974, -73.97031)CENTRAL PARK WESTWEST 84 STREETNaN0.00000000Driver Inattention/DistractionDriver Inattention/DistractionNaNNaNNaN4396899VanSedanNaNNaNNaN
12836003/06/202114:40NaNNaNNaNNaNNaNBRUCKNER EXPRESSWAY RAMPNaNNaN0.00000000Unsafe Lane ChangingUnspecifiedNaNNaNNaN4397161TaxiSedanNaNNaNNaN
12836103/06/20216:14BRONX10452.040.841960-73.915306(40.84196, -73.915306)EAST 172 STREETTOWNSEND AVENUENaN0.00000000Driver Inattention/DistractionUnspecifiedNaNNaNNaN4396610SedanNaNNaNNaNNaN
12836203/06/202117:00MANHATTAN10031.040.822834-73.953710(40.822834, -73.95371)NaNNaN610 WEST 139 STREET0.00000000UnspecifiedNaNNaNNaNNaN4396982Station Wagon/Sport Utility VehicleNaNNaNNaNNaN
12836303/06/202119:30BRONX10466.040.887096-73.860870(40.887096, -73.86087)WHITE PLAINS ROADEAST 224 STREETNaN0.00000000Driver Inattention/DistractionDriver Inattention/DistractionNaNNaNNaN4396676SedanNaNNaNNaNNaN
12836403/06/20211:30BRONX10466.040.883410-73.837800(40.88341, -73.8378)NaNNaN3601 PALMER AVENUE0.00000000UnspecifiedNaNNaNNaNNaN4396673DumpNaNNaNNaNNaN
12836503/06/202116:30QUEENS11101.040.737537-73.929955(40.737537, -73.929955)HUNTERS POINT AVENUEGREENPOINT AVENUENaN0.00000000Driver Inattention/DistractionPassing or Lane Usage ImproperNaNNaNNaN4396622Station Wagon/Sport Utility VehicleSedanNaNNaNNaN
12836603/06/202118:15BROOKLYN11212.040.654137-73.912340(40.654137, -73.91234)LINDEN BOULEVARDROCKAWAY PARKWAYNaN0.00000000UnspecifiedNaNNaNNaNNaN4397200Station Wagon/Sport Utility VehicleNaNNaNNaNNaN